Genetic Algorithm-Based Intelligent Parameter Optimization for Apache Hadoop MapReduce Systems: A Machine Learning and Evolutionary Computing Approach

Author Details

Naga Charan Nandigama

Journal Details

Published

Published: 11 December 2023 | Article Type : Research Article

Abstract

Apache Hadoop MapReduce remains a foundational distributed computing platform, yet its performance is critically dependent on proper configuration of 190+ interdependent parameters. Manual parameter tuning is prohibitively time-consuming and suboptimal, while existing automated approaches suffer from excessive computational overhead. This paper presents a comprehensive intelligent optimization framework integrating genetic algorithms (GA) with genetic programming (GP), machine learning classification, ensemble methods, and reinforcement learning to automatically optimize Hadoop MapReduce configuration parameters. The proposed approach employs genetic programming to derive mathematical objective fitness functions from empirical MapReduce job execution data, subsequently applying parallel genetic algorithm optimization with advanced selection, crossover, and mutation operators to efficiently search the parameter space. Comprehensive experimental evaluation on a 4-node Hadoop 3.3.0 cluster demonstrates substantial performance improvements: WordCount applications achieve 63-69% execution time reduction across 1GB-10GB datasets, TeraSort achieves 52-55% improvement, and Grep/Index applications achieve 47-56% speedup. The framework optimizes eight critical parameters through 200 generations with population size 15, achieving convergence within 40,000 fitness evaluations. The research contributes novel fitness function generation through genetic programming, parallel GA implementation for parameter optimization, ML-based parameter importance ranking, ensemble prediction models, and dynamic parameter adjustment mechanisms.

Keywords: Genetic Algorithm, Genetic Programming, Hadoop MapReduce, Parameter Optimization, Machine Learning, Evolutionary Computing, Big Data Performance, Configuration Tuning, Fitness Function Engineering, Ensemble Methods, Reinforcement Learning, Distributed Computing.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Copyright © Author(s) retain the copyright of this article.

Statistics

1 Views

3 Downloads

Volume & Issue

Article Type

Research Article

How to Cite

Citation:

Naga Charan Nandigama. (2023-12-11). "Genetic Algorithm-Based Intelligent Parameter Optimization for Apache Hadoop MapReduce Systems: A Machine Learning and Evolutionary Computing Approach." *Volume 6*, 2, 1-8